{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Predictive Maintenance using Deep Learning\n",
    "## Fault Classification using deep learning with Keras\n",
    "\n",
    "#### Author Nagdev Amruthnath\n",
    "Date: 1/10/2019\n",
    "\n",
    "##### Citation Info\n",
    "If you are using this for your research, please use the following for citation. \n",
    "\n",
    "Amruthnath, Nagdev, and Tarun Gupta. \"A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance.\" In 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), pp. 355-361. IEEE, 2018.\n",
    "\n",
    "##### Disclaimer\n",
    "This is a tutorial for performing fault detection using machine learning. You this code at your own risk. I do not gurantee that this would work as shown below. If you have any suggestions please branch this project.\n",
    "\n",
    "## Introduction\n",
    "This is the first of four part demostration series of using machine learning for predictive maintenance.   \n",
    "\n",
    "The area of predictive maintenance has taken a lot of prominence in the last couple of years due to various reasons. With new algorithms and methodologies growing across different learning methods, it has remained a challenge for industries to adopt which method is fit, robust and provide most accurate detection. One the most common learning approaches used today for fault diagnosis is supervised learning. This is wholly based on the predictor variable and response variable. In this tutorial, we will be looking into deep learning models for fault classification using keras package in R.\n",
    "\n",
    "\n",
    "## Load libraries "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Skipping install of 'EnsembleML' from a github remote, the SHA1 (e4318dbf) has not changed since last install.\n",
      "  Use `force = TRUE` to force installation\n"
     ]
    }
   ],
   "source": [
    "#install custom Ensemble Package\n",
    "devtools::install_github(\"nagdevAmruthnath/EnsembleML\")\n",
    "library(EnsembleML)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-01-10T19:23:37.732876Z",
     "start_time": "2019-01-10T19:23:37.657Z"
    },
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Loading required package: lattice\n",
      "Loading required package: ggplot2\n",
      "\n",
      "Attaching package: ‘dplyr’\n",
      "\n",
      "The following objects are masked from ‘package:stats’:\n",
      "\n",
      "    filter, lag\n",
      "\n",
      "The following objects are masked from ‘package:base’:\n",
      "\n",
      "    intersect, setdiff, setequal, union\n",
      "\n"
     ]
    }
   ],
   "source": [
    "options(warn=-1)\n",
    "\n",
    "# load libraries\n",
    "library(caret)\n",
    "library(dplyr)\n",
    "library(EnsembleML)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load data\n",
    "Here we are using data from a bench press. There are total of four different states in this machine and they are split into four different csv files. We need to load the data first. In the data time represents the time between samples, ax is the acceleration on x axis, ay is the acceleration on y axis, az is the acceleration on z axis and at is the G's. The data was collected at sample rate of 100hz.   \n",
    "\n",
    "Four different states of the machine were collected  \n",
    "1. Nothing attached to drill press\n",
    "2. Wooden base attached to drill press\n",
    "3. Imbalance created by adding weight to one end of wooden base\n",
    "4. Imbalacne created by adding weight to two ends of wooden base."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-01-10T19:23:40.522397Z",
     "start_time": "2019-01-10T19:23:40.029Z"
    }
   },
   "outputs": [],
   "source": [
    "setwd(\"/home\")\n",
    "#read csv files\n",
    "file1 = read.csv(\"dry run.csv\", sep=\",\", header =T)\n",
    "file2 = read.csv(\"base.csv\", sep=\",\", header =T)\n",
    "file3 = read.csv(\"imbalance 1.csv\", sep=\",\", header =T)\n",
    "file4 = read.csv(\"imbalance 2.csv\", sep=\",\", header =T)\n",
    "\n",
    "#Add labels to data\n",
    "file1$label = 1\n",
    "file2$label = 2\n",
    "file3$label = 3\n",
    "file4$label = 4\n",
    "\n",
    "#view top rows of data\n",
    "#head(file1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can look at the summary of each file using summary function in R. Below, we can observe that 66 seconds long data is available. We also have min, max and mean for each of the variables. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-01-10T19:23:42.029410Z",
     "start_time": "2019-01-10T19:23:41.958Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "      time              ax                 ay                  az         \n",
       " Min.   : 0.002   Min.   :-2.11880   Min.   :-2.143600   Min.   :-4.1744  \n",
       " 1st Qu.:16.507   1st Qu.:-0.41478   1st Qu.:-0.625250   1st Qu.:-0.7359  \n",
       " Median :33.044   Median : 0.02960   Median :-0.022050   Median :-0.1468  \n",
       " Mean   :33.037   Mean   : 0.01233   Mean   : 0.008697   Mean   :-0.1021  \n",
       " 3rd Qu.:49.535   3rd Qu.: 0.46003   3rd Qu.: 0.641700   3rd Qu.: 0.4298  \n",
       " Max.   :66.033   Max.   : 2.09620   Max.   : 2.003000   Max.   : 4.9466  \n",
       "       aT            label  \n",
       " Min.   :0.032   Min.   :1  \n",
       " 1st Qu.:0.848   1st Qu.:1  \n",
       " Median :1.169   Median :1  \n",
       " Mean   :1.277   Mean   :1  \n",
       " 3rd Qu.:1.579   3rd Qu.:1  \n",
       " Max.   :5.013   Max.   :1  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# summary of each file\n",
    "summary(file1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Aggregration and feature extraction\n",
    "Here, the data is aggregated by 1 minute and features are extracted. Features are extracted to reduce the dimension of the data and only storing the representation of the data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-01-10T19:23:46.121459Z",
     "start_time": "2019-01-10T19:23:45.638Z"
    }
   },
   "outputs": [],
   "source": [
    "file1$group = as.factor(round(file1$time))\n",
    "file2$group = as.factor(round(file2$time))\n",
    "file3$group = as.factor(round(file3$time))\n",
    "file4$group = as.factor(round(file4$time))\n",
    "#(file1,20)\n",
    "\n",
    "#list of all files\n",
    "files = list(file1, file2, file3, file4)\n",
    "\n",
    "#loop through all files and combine\n",
    "features = NULL\n",
    "for (i in 1:4){\n",
    "res = files[[i]] %>%\n",
    "    group_by(group) %>%\n",
    "    summarize(ax_mean = mean(ax),\n",
    "              ax_sd = sd(ax),\n",
    "              ax_min = min(ax),\n",
    "              ax_max = max(ax),\n",
    "              ax_median = median(ax),\n",
    "              ay_mean = mean(ay),\n",
    "              ay_sd = sd(ay),\n",
    "              ay_min = min(ay),\n",
    "              ay_may = max(ay),\n",
    "              ay_median = median(ay),\n",
    "              az_mean = mean(az),\n",
    "              az_sd = sd(az),\n",
    "              az_min = min(az),\n",
    "              az_maz = max(az),\n",
    "              az_median = median(az),\n",
    "              aT_mean = mean(aT),\n",
    "              aT_sd = sd(aT),\n",
    "              aT_min = min(aT),\n",
    "              aT_maT = max(aT),\n",
    "              aT_median = median(aT),\n",
    "              label = mean(label)\n",
    "             )\n",
    "    features = rbind(features, res)\n",
    "} %>% as.data.frame()\n",
    "\n",
    "#view all features\n",
    "features$label = ifelse(features$label==1, \"OK\", ifelse(features$label==2, \"base\", ifelse(features$label==3, \"Imb-1\", \"Imb-2\")))\n",
    "features = features %>% na.omit() %>% mutate_at('label',as.factor)  %>% as.data.frame()\n",
    "\n",
    "#head(features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create sample size for training the model\n",
    "From the information, we know that we have four states in the data. Based on this information, the data is split into train and test samples. The train set is used to build the model and test set is used to validate the model. The ratio between train and test is 80:20. You can adjust this based on type of data. The below table shows the number of observations for each group.   \n",
    "\n",
    "Note: It is adviced to have atleast 30 samples for each group. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-01-10T19:23:47.823447Z",
     "start_time": "2019-01-10T19:23:47.767Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       " base Imb-1 Imb-2    OK \n",
       "  109    93    93    67 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "table(features$label)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the above results, we can observe that there are atleast 30 samples for each group. Now, we can used this data to split into train and test set. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-01-10T19:25:44.315656Z",
     "start_time": "2019-01-10T19:25:44.255Z"
    }
   },
   "outputs": [],
   "source": [
    "#create samples of 80:20 ratio\n",
    "sample = sample(nrow(features) , nrow(features)* 0.70)\n",
    "\n",
    "train1 = features[sample,2:ncol(features)]\n",
    "test1 = features[-sample,2:ncol(features)]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "TRUE"
      ],
      "text/latex": [
       "TRUE"
      ],
      "text/markdown": [
       "TRUE"
      ],
      "text/plain": [
       "[1] TRUE"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "is.factor(train1[,'label'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ensemble Modeling\n",
    "\n",
    "### Fault Classification using Ensemble Models\n",
    "Ensemble models in machine learning combine the decisions from multiple models to improve the overall performance. The main causes of error in learning models are due to noise, bias and variance.\n",
    "Ensemble methods help to minimize these factors. These methods are designed to improve the stability and the accuracy of Machine Learning algorithms.\n",
    "\n",
    "### Train Multiple models at once"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# weights:  129\n",
      "initial  value 372.183353 \n",
      "iter  10 value 302.084608\n",
      "iter  20 value 224.113374\n",
      "iter  30 value 145.606094\n",
      "iter  40 value 87.015573\n",
      "iter  50 value 64.412881\n",
      "iter  60 value 62.601315\n",
      "iter  70 value 61.770846\n",
      "iter  80 value 61.605244\n",
      "iter  90 value 61.318439\n",
      "iter 100 value 61.036910\n",
      "final  value 61.036910 \n",
      "stopped after 100 iterations\n",
      "# weights:  129\n",
      "initial  value 417.012160 \n",
      "iter  10 value 319.687250\n",
      "iter  20 value 163.294440\n",
      "iter  30 value 85.300481\n",
      "iter  40 value 65.658585\n",
      "iter  50 value 63.225166\n",
      "iter  60 value 62.684881\n",
      "iter  70 value 62.401495\n",
      "iter  80 value 62.294629\n",
      "iter  90 value 62.255201\n",
      "iter 100 value 62.230713\n",
      "final  value 62.230713 \n",
      "stopped after 100 iterations\n",
      "# weights:  129\n",
      "initial  value 445.656478 \n",
      "iter  10 value 290.807505\n",
      "iter  20 value 171.806323\n",
      "iter  30 value 90.048886\n",
      "iter  40 value 65.562345\n",
      "iter  50 value 63.919257\n",
      "iter  60 value 63.582862\n",
      "iter  70 value 63.302945\n",
      "iter  80 value 62.265786\n",
      "iter  90 value 61.482850\n",
      "iter 100 value 60.910019\n",
      "final  value 60.910019 \n",
      "stopped after 100 iterations\n",
      "# weights:  129\n",
      "initial  value 380.050584 \n",
      "iter  10 value 223.563530\n",
      "iter  20 value 109.018976\n",
      "iter  30 value 81.668785\n",
      "iter  40 value 67.538336\n",
      "iter  50 value 66.692255\n",
      "iter  60 value 65.646220\n",
      "iter  70 value 65.190500\n",
      "iter  80 value 64.950361\n",
      "iter  90 value 64.484820\n",
      "iter 100 value 62.871993\n",
      "final  value 62.871993 \n",
      "stopped after 100 iterations\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "$summary\n",
       "           Accuracy     Kappa AccuracyLower AccuracyUpper AccuracyNull\n",
       "parRF     0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "knn       0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "svmRadial 0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "nnet      0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "          AccuracyPValue McnemarPValue\n",
       "parRF       1.895382e-46           NaN\n",
       "knn         1.895382e-46           NaN\n",
       "svmRadial   1.895382e-46           NaN\n",
       "nnet        1.895382e-46           NaN\n",
       "\n",
       "$models\n",
       "$models$parRF\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 226, 228, 227, 228, 228, 228, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8071300  0.7338399\n",
       "  1     1e-04  0.8692250  0.8206072\n",
       "  1     1e-01  0.9298206  0.9050334\n",
       "  3     0e+00  0.8958869  0.8588042\n",
       "  3     1e-04  0.9463675  0.9277615\n",
       "  3     1e-01  0.9561081  0.9409480\n",
       "  5     0e+00  0.9158579  0.8865881\n",
       "  5     1e-04  0.9510853  0.9340053\n",
       "  5     1e-01  0.9588952  0.9447588\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n",
       "$models$knn\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 229, 228, 227, 227, 227, 227, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8371856  0.7776150\n",
       "  1     1e-04  0.8939859  0.8554721\n",
       "  1     1e-01  0.9328117  0.9091617\n",
       "  3     0e+00  0.9148482  0.8846781\n",
       "  3     1e-04  0.9519482  0.9352587\n",
       "  3     1e-01  0.9546560  0.9390297\n",
       "  5     0e+00  0.9283364  0.9033456\n",
       "  5     1e-04  0.9460272  0.9271158\n",
       "  5     1e-01  0.9582791  0.9439202\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n",
       "$models$svmRadial\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 228, 229, 228, 228, 228, 227, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8152251  0.7469022\n",
       "  1     1e-04  0.8849571  0.8422764\n",
       "  1     1e-01  0.9288526  0.9036491\n",
       "  3     0e+00  0.9095010  0.8776976\n",
       "  3     1e-04  0.9380557  0.9166136\n",
       "  3     1e-01  0.9559429  0.9407917\n",
       "  5     0e+00  0.9303231  0.9060761\n",
       "  5     1e-04  0.9512366  0.9344031\n",
       "  5     1e-01  0.9594980  0.9455727\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n",
       "$models$nnet\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 227, 228, 228, 228, 229, 227, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8472056  0.7905724\n",
       "  1     1e-04  0.8964815  0.8592248\n",
       "  1     1e-01  0.9337588  0.9103683\n",
       "  3     0e+00  0.8986425  0.8624266\n",
       "  3     1e-04  0.9406647  0.9199172\n",
       "  3     1e-01  0.9531906  0.9370397\n",
       "  5     0e+00  0.9301852  0.9059272\n",
       "  5     1e-04  0.9528722  0.9365436\n",
       "  5     1e-01  0.9571935  0.9424301\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "mm = multipleModels(train = train1, test = test1, y = \"label\", models = c(\"parRF\", \"knn\", \"svmRadial\", \"nnet\"))\n",
    "mm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above training, multiple models were trained and the trained tuning parametrics are as shown above. Now we can use the above trained models to create an ensemble. \n",
    "\n",
    "#### Ensemble training\n",
    "In Ensemble training, multiple models are trained in the previous step as input and are fed to a new model to predict the out come. ensembleTrain() function can be used as follows. The classification output is a confusion matrix and the results are as follows. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "$summary\n",
       "Confusion Matrix and Statistics\n",
       "\n",
       "          Reference\n",
       "Prediction base Imb-1 Imb-2 OK\n",
       "     base    35     0     0  1\n",
       "     Imb-1    0    29     0  1\n",
       "     Imb-2    0     2    24  0\n",
       "     OK       0     0     0 17\n",
       "\n",
       "Overall Statistics\n",
       "                                          \n",
       "               Accuracy : 0.9633          \n",
       "                 95% CI : (0.9087, 0.9899)\n",
       "    No Information Rate : 0.3211          \n",
       "    P-Value [Acc > NIR] : < 2.2e-16       \n",
       "                                          \n",
       "                  Kappa : 0.9501          \n",
       "                                          \n",
       " Mcnemar's Test P-Value : NA              \n",
       "\n",
       "Statistics by Class:\n",
       "\n",
       "                     Class: base Class: Imb-1 Class: Imb-2 Class: OK\n",
       "Sensitivity               1.0000       0.9355       1.0000    0.8947\n",
       "Specificity               0.9865       0.9872       0.9765    1.0000\n",
       "Pos Pred Value            0.9722       0.9667       0.9231    1.0000\n",
       "Neg Pred Value            1.0000       0.9747       1.0000    0.9783\n",
       "Prevalence                0.3211       0.2844       0.2202    0.1743\n",
       "Detection Rate            0.3211       0.2661       0.2202    0.1560\n",
       "Detection Prevalence      0.3303       0.2752       0.2385    0.1560\n",
       "Balanced Accuracy         0.9932       0.9613       0.9882    0.9474\n",
       "\n",
       "$model\n",
       "$model[[1]]\n",
       "Random Forest \n",
       "\n",
       "253 samples\n",
       "  4 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 227, 228, 228, 226, 228, 229, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  mtry  Accuracy   Kappa    \n",
       "  2     0.9921477  0.9894408\n",
       "  3     0.9921477  0.9894408\n",
       "  4     0.9921477  0.9894408\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final value used for the model was mtry = 2.\n",
       "\n",
       "\n",
       "$mm\n",
       "$mm$summary\n",
       "           Accuracy     Kappa AccuracyLower AccuracyUpper AccuracyNull\n",
       "parRF     0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "knn       0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "svmRadial 0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "nnet      0.9633028 0.9501372     0.9087017     0.9899123    0.3211009\n",
       "          AccuracyPValue McnemarPValue\n",
       "parRF       1.895382e-46           NaN\n",
       "knn         1.895382e-46           NaN\n",
       "svmRadial   1.895382e-46           NaN\n",
       "nnet        1.895382e-46           NaN\n",
       "\n",
       "$mm$models\n",
       "$mm$models$parRF\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 226, 228, 227, 228, 228, 228, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8071300  0.7338399\n",
       "  1     1e-04  0.8692250  0.8206072\n",
       "  1     1e-01  0.9298206  0.9050334\n",
       "  3     0e+00  0.8958869  0.8588042\n",
       "  3     1e-04  0.9463675  0.9277615\n",
       "  3     1e-01  0.9561081  0.9409480\n",
       "  5     0e+00  0.9158579  0.8865881\n",
       "  5     1e-04  0.9510853  0.9340053\n",
       "  5     1e-01  0.9588952  0.9447588\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n",
       "$mm$models$knn\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 229, 228, 227, 227, 227, 227, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8371856  0.7776150\n",
       "  1     1e-04  0.8939859  0.8554721\n",
       "  1     1e-01  0.9328117  0.9091617\n",
       "  3     0e+00  0.9148482  0.8846781\n",
       "  3     1e-04  0.9519482  0.9352587\n",
       "  3     1e-01  0.9546560  0.9390297\n",
       "  5     0e+00  0.9283364  0.9033456\n",
       "  5     1e-04  0.9460272  0.9271158\n",
       "  5     1e-01  0.9582791  0.9439202\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n",
       "$mm$models$svmRadial\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 228, 229, 228, 228, 228, 227, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8152251  0.7469022\n",
       "  1     1e-04  0.8849571  0.8422764\n",
       "  1     1e-01  0.9288526  0.9036491\n",
       "  3     0e+00  0.9095010  0.8776976\n",
       "  3     1e-04  0.9380557  0.9166136\n",
       "  3     1e-01  0.9559429  0.9407917\n",
       "  5     0e+00  0.9303231  0.9060761\n",
       "  5     1e-04  0.9512366  0.9344031\n",
       "  5     1e-01  0.9594980  0.9455727\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n",
       "$mm$models$nnet\n",
       "Neural Network \n",
       "\n",
       "253 samples\n",
       " 20 predictor\n",
       "  4 classes: 'base', 'Imb-1', 'Imb-2', 'OK' \n",
       "\n",
       "No pre-processing\n",
       "Resampling: Cross-Validated (10 fold, repeated 10 times) \n",
       "Summary of sample sizes: 227, 228, 228, 228, 229, 227, ... \n",
       "Resampling results across tuning parameters:\n",
       "\n",
       "  size  decay  Accuracy   Kappa    \n",
       "  1     0e+00  0.8472056  0.7905724\n",
       "  1     1e-04  0.8964815  0.8592248\n",
       "  1     1e-01  0.9337588  0.9103683\n",
       "  3     0e+00  0.8986425  0.8624266\n",
       "  3     1e-04  0.9406647  0.9199172\n",
       "  3     1e-01  0.9531906  0.9370397\n",
       "  5     0e+00  0.9301852  0.9059272\n",
       "  5     1e-04  0.9528722  0.9365436\n",
       "  5     1e-01  0.9571935  0.9424301\n",
       "\n",
       "Kappa was used to select the optimal model using the largest value.\n",
       "The final values used for the model were size = 5 and decay = 0.1.\n",
       "\n",
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "em = ensembleTrain(mm, train =train1, test =test1, y = \"label\", emsembleModelTrain = \"rf\")\n",
    "em"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thanks for staying till here. \n",
    "\n",
    "### Conclusion\n",
    "Ensemble models are pretty famous and used everywhere. They always don't promise high accuracy but, the results can sometimes be unsual. Please note that the models are very difficult to interpret and sometimes unexplainable. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
